Indowordnets help in Indian Language Machine Translation
نویسندگان
چکیده
Being less resource languages, Indian-Indian and English-Indian language MT system developments faces the difficulty to translate various lexical phenomena. In this paper, we present our work on a comparative study of 440 phrase-based statistical trained models for 110 language pairs across 11 Indian languages. We have developed 110 baseline Statistical Machine Translation systems. Then we have augmented the training corpus with Indowordnet synset word entries of lexical database and further trained 110 models on top of the baseline system. We have done a detailed performance comparison using various evaluation metrics such as BLEU score, METEOR and TER. We observed significant improvement in evaluations of translation quality across all the 440 models after using the Indowordnet. These experiments give a detailed insight in two ways : (1) usage of lexical database with synset mapping for resource poor languages (2) efficient usage of Indowordnet sysnset mapping. More over, synset mapped lexical entries helped the SMT system to handle the ambiguity to a great extent during the translation.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملApplication of Linguistic Rules to Generalized Example Based Machine Translation for Indian Languages
This paper shows how certain perceptible linguistic rules help in the process of making Machine Translation simpler and computationally inexpensive when translating from English to a class of Indian Languages. The paper explains the method in detail for Kannada (Language spoken in Karnataka) and can be extended to other Indian Languages. A combination of a simple Generalized Example Based Machi...
متن کاملA Comparative Study of English-Persian Translation of Neural Google Translation
Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...
متن کاملMTIL17: English to Indian Langauge Statistical Machine Translation
English to Indian language machine translation poses the challenge of structural and morphological divergence. This paper describes English to Indian language statistical machine translation using pre-ordering and suffix separation. The pre-ordering uses rules to transfer the structure of the source sentences prior to training and translation. This syntactic restructuring helps statistical mach...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.02086 شماره
صفحات -
تاریخ انتشار 2017